-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoset nprocs and improve max_blocks estimate #949
Autoset nprocs and improve max_blocks estimate #949
Conversation
This looks pretty good. A couple thoughts.
The modifications I'm working on handle the nprocs thing in much the same way, but also refactor the max_blocks calculation so each proc (mpi task) has a locally defined max_blocks that matches the decomposition exactly. I'm still developing and testing to make sure it'll work properly. I'm happy to merge this as a first step (if the CESMCOUPLED thing is fixed). I could then update the implementation if I make additional progress. Thoughts? |
Lets go with this. I think I made the related changes to do this. I guess we should add some tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Anton.
This change allows nprocs to be set to -1 in 'ice_in' and then the number of processors will be automatically detected.
This change re-orders the calculation of max_blocks to ensure nprocs is set before the calculation.
This description reflects the latest state of the PR, super.
@@ -242,16 +257,6 @@ subroutine init_domain_blocks | |||
!*** domain size zero or negative | |||
!*** | |||
call abort_ice(subname//' ERROR: Invalid domain: size < 1', file=__FILE__, line=__LINE__) ! no domain | |||
else if (nprocs /= get_num_procs()) then | |||
!*** | |||
!*** input nprocs does not match system (eg MPI) request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could keep this comment, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this clear from the ERROR message?
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid whitespace-only lines here ? Also I would think a single blank line is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it one line instead of two :)
@@ -370,6 +370,7 @@ domain_nml | |||
"``maskhalo_bound``", "logical", "mask unused halo cells for boundary updates", "``.false.``" | |||
"``max_blocks``", "integer", "maximum number of blocks per MPI task for memory allocation", "-1" | |||
"``nprocs``", "integer", "number of processors to use", "-1" | |||
"", "``-1``", "find number of processors automatically", "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think while we're in here, "MPI ranks" would be more accurate than "processors" (for both the new line and the existing one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to use "processors" throughout the rest of the docs, although they seem light on for details about MPI in general.
I definitely out of my depth here .. but google implies the rank is just the unique name for each process ? So this should be number of "processes" as it could be possible to have multiple processes per processor ! (And by this logic, we should change all the other "processors" to "processes" in the docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "MPI tasks" is technically correct here. Processors and processes are both wrong. "MPI ranks" is probably the same thing, but not terminology we generally use in the document. Lets go with "MPI tasks"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "Number of MPI tasks (typically 1 per processor) to use" good?
There are lots of uses of "processor" in the docs, I guess these could be changed to task? Should I make a new issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"typically 1 per processor" would not be a good thing to add. The way I think about this is that processors are the hardware in use while tasks and threads are how those processors are used. So nprocs is specifically associated with the number of MPI tasks only. It has nothing to do with threads. If you are running 16 mpi tasks threaded 2 ways each, you are using 32 processors and have 16 MPI tasks. nprocs is MPI tasks and has nothing to do with the "processors" or "processes" used/set. To me, processes is a word separate from hardware. If you could oversubscribe threads on a processors, you might have 1 processor but 4 processes on that processor.
Again, this is how I think about things and probably how most of the documentation uses that terminology. I fully acknowledge that my ideas about this may not be the norm. In that case, we should update the documentation using more standard language. But before we do, we should carefully think about the terminology. At this point, I would just update this one line of documentation unless you see other glaring errors. We could open an issue, but before we do, I'd like for someone to review the documentation (at least a few sections here and there) to see whether there seems to be a bunch of incorrect terminology. I prefer to have an issue that says "several things need to be fixed such as...." rather than "review documentation for correct/consistent use of tasks, threads, processors, processes".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tony. Ill just make it say "Number of MPI tasks". I am a bit out of my depth re the best terminology for the documentation, so ill leave it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you still need to update the documentation here. Will start some testing on the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the update, I wonder if it no longer makes sense if compiling with the 'serial' folder instead of the 'mpi' folder ?
#ifdef CESMCOUPLED | ||
nprocs = get_num_procs() | ||
#else | ||
if (nprocs == -1) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make this (nprocs < 0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok done
I will run a full test suite once comments are addressed and the PR is no longer draft. Thanks! |
My question here is whether we should add some test cases for |
Co-authored-by: anton-seaice <[email protected]>
ab0dcb3
to
56304b7
Compare
Please run a test manually with nprocs=-1 to make sure it works. I think the question is maybe whether we should have nprocs=-1 be the default setup for all tests. You could try that and see if everything runs and is bit-for-bit. I think you'd need to set nprocs=-1 in configuration/scripts/ice_in and then remove "nprocs = ${task}" in cice.setup. Is that the direction we want to go? |
Our intermittent failure is back ! https://github.com/CICE-Consortium/CICE/actions/runs/9024916341/job/24799676376 failed But they are the same commit |
I restarted the github action failure and it passed. Ugh.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full test suite run on derecho with intel, all passed.
PR checklist
Short (1 sentence) summary of your PR:
Contributes to Can we remove
nprocs
fromice_in
? #945This improves setting nprocs and max_blocks automatically.
Developer(s):
@anton-seaice @minghangli-uni
Suggest PR reviewers from list in the column to the right.
@apcraig
Please copy the PR test results link or provide a summary of testing completed below.
Needs doing
How much do the PR code changes differ from the unmodified code?
Does this PR create or have dependencies on Icepack or any other models?
Does this PR update the Icepack submodule? If so, the Icepack submodule must point to a hash on Icepack's main branch.
Does this PR add any new test cases?
Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)
Please document the changes in detail, including why the changes are made. This will become part of the PR commit log.
This change allows
nprocs
to be set to-1
in 'ice_in' and then the number of processors will be automatically detected.This change improves the automatic calculation of
max_blocks
to give a better (but still not foolproof) estimate of max_blocks if it is not set inice_in
.